Mapping Tree Diversity: Analyzing Tree Traits Across Vancouver’s Neighborhoods#

This exploratory data analysis investigates spatial and geographical patterns in the Vancouver Street Trees dataset. Specifically, it explores how trees are distributed across neighborhoods in terms of abundance, size, and species diversity.

Urban trees contribute significantly to environmental and social well-being in cities—they provide shade, improve air quality, reduce urban heat, and enhance neighborhood livability. Understanding how tree characteristics vary between neighborhoods can support more equitable and effective urban planning, sustainability efforts, and biodiversity initiatives.

This analysis draws on a subset of the full dataset, containing 5,000 entries. It focuses on key features such as neighbourhood_name, diameter, genus_name, and species_name to identify patterns and trends in Vancouver’s urban forest.

Questions of Interest#

  1. Which neighborhoods have the highest and lowest number of trees?

  2. Is there a relationship between neighborhood and average tree diameter (or height range)?

  3. Are certain neighborhoods dominated by specific genera?

Analysis#

We’ll start by loading the necessary libraries and reading in the Vancouver Street Trees dataset.

Hide code cell source

import pandas as pd
import altair as alt

# Load dataset
url = "https://raw.githubusercontent.com/UBC-MDS/data_viz_wrangled/main/data/Trees_data_sets/small_unique_vancouver.csv"
trees_df = pd.read_csv(url)

Summarizing the Data#

Now we’ll go ahead and review the structure and summary statistics of the dataset:

Hide code cell source

trees_df.info()
trees_df.describe(include='all')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 21 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Unnamed: 0          5000 non-null   int64  
 1   std_street          5000 non-null   object 
 2   on_street           5000 non-null   object 
 3   species_name        5000 non-null   object 
 4   neighbourhood_name  5000 non-null   object 
 5   date_planted        2363 non-null   object 
 6   diameter            5000 non-null   float64
 7   street_side_name    5000 non-null   object 
 8   genus_name          5000 non-null   object 
 9   assigned            5000 non-null   object 
 10  civic_number        5000 non-null   int64  
 11  plant_area          4950 non-null   object 
 12  curb                5000 non-null   object 
 13  tree_id             5000 non-null   int64  
 14  common_name         5000 non-null   object 
 15  height_range_id     5000 non-null   int64  
 16  on_street_block     5000 non-null   int64  
 17  cultivar_name       2658 non-null   object 
 18  root_barrier        5000 non-null   object 
 19  latitude            5000 non-null   float64
 20  longitude           5000 non-null   float64
dtypes: float64(3), int64(5), object(13)
memory usage: 820.4+ KB
Unnamed: 0 std_street on_street species_name neighbourhood_name date_planted diameter street_side_name genus_name assigned ... plant_area curb tree_id common_name height_range_id on_street_block cultivar_name root_barrier latitude longitude
count 5000.000000 5000 5000 5000 5000 2363 5000.000000 5000 5000 5000 ... 4950 5000 5000.000000 5000 5000.00000 5000.000000 2658 5000 5000.000000 5000.000000
unique NaN 603 607 171 22 1599 NaN 4 67 2 ... 38 2 NaN 361 NaN NaN 176 2 NaN NaN
top NaN CAMBIE ST CAMBIE ST SERRULATA Renfrew-Collingwood 2004-02-16 NaN ODD ACER N ... 10 Y NaN KWANZAN FLOWERING CHERRY NaN NaN KWANZAN N NaN NaN
freq NaN 52 49 463 384 7 NaN 2554 1218 4564 ... 736 4593 NaN 383 NaN NaN 383 4679 NaN NaN
mean 14861.920400 NaN NaN NaN NaN NaN 12.340888 NaN NaN NaN ... NaN NaN 128682.584600 NaN 2.73440 2960.227000 NaN NaN 49.247349 -123.107128
std 8680.023278 NaN NaN NaN NaN NaN 9.266600 NaN NaN NaN ... NaN NaN 75412.260406 NaN 1.56957 2086.861052 NaN NaN 0.021251 0.049137
min 2.000000 NaN NaN NaN NaN NaN 0.000000 NaN NaN NaN ... NaN NaN 36.000000 NaN 0.00000 0.000000 NaN NaN 49.202783 -123.220560
25% 7192.750000 NaN NaN NaN NaN NaN 4.000000 NaN NaN NaN ... NaN NaN 61321.500000 NaN 2.00000 1300.000000 NaN NaN 49.230152 -123.144178
50% 14870.000000 NaN NaN NaN NaN NaN 10.000000 NaN NaN NaN ... NaN NaN 130130.500000 NaN 2.00000 2600.000000 NaN NaN 49.247981 -123.105861
75% 22366.750000 NaN NaN NaN NaN NaN 18.000000 NaN NaN NaN ... NaN NaN 191332.000000 NaN 4.00000 4100.000000 NaN NaN 49.263275 -123.063484
max 29992.000000 NaN NaN NaN NaN NaN 71.000000 NaN NaN NaN ... NaN NaN 270750.000000 NaN 9.00000 9100.000000 NaN NaN 49.293930 -123.023311

11 rows × 21 columns

The dataset contains 5,000 rows and 21 columns. It provides details on street trees in Vancouver, including their genus, species, diameter, height range, and the neighborhood in which they are located.

Although the dataset includes many attributes, this analysis will focus on the columns most relevant to answering the research questions:

  • genus_name: Used to compare tree distributions across different parts of Vancouver.

  • neighbourhood_name: Helps identify patterns in tree dominance and biodiversity in neighborhoods.

  • diameter: Can act as a rough estimate for tree maturity.

  • height_range_id: General classification of tree size, useful for understanding tree growth variation by location.

Columns with extensive missing data, such as date_planted and cultivar_name, will be excluded from further analysis to ensure data quality and maintain focus on key variables.

Figure 1: Tree Height Distribution by Neighborhood (Top 10)#

This faceted histogram visualizes the distribution of tree heights across the top 10 neighborhoods with the highest number of street trees in Vancouver. Each facet represents one neighborhood, allowing for easy comparison of tree height diversity across these areas.

By focusing on neighborhoods with the most trees, we can identify which areas tend to have taller or shorter trees on average, and observe the variation in tree size distribution. This provides insights into the structural diversity of Vancouver’s urban forest in its most populated green areas.

Hide code cell source

# Create a new list of the top 10 neighborhoods by tree count
top_neighbs = trees_df['neighbourhood_name'].value_counts().nlargest(10).index.tolist()

# filter the trees_df to create a new df that contains the top 10 neighborhoods
top_trees_df = trees_df[trees_df['neighbourhood_name'].isin(top_neighbs)]

# create a histogram of tree height distribution by neighborhood
height_hist = alt.Chart(top_trees_df).mark_bar().encode(
    alt.X('height_range_id:Q', bin=alt.Bin(maxbins=20), title='Tree Height Range (m)'),
    y='count()',
    color=alt.Color('height_range_id:Q', legend=None)
).properties(
    width=300,
    height=120
).facet(
    facet='neighbourhood_name:N',
    columns=2
).resolve_scale(
    y='independent'
).properties(
    title='Tree Height Distribution by Neighborhood (Top 10 Neighborhoods)'
)

height_hist

Figure 2: Distribution of Tree Counts Among Vancouver’s Top 10 Neighborhoods#

The plot below shows which neighborhoods have the most or fewest trees out of the top_trees_df. This helps clarify tree distribution city-wide and helps explore areas such as species diversity further.

Hide code cell source

top_tree_count = alt.Chart(top_trees_df).mark_bar().encode(
    alt.X('count()', title='Number of Trees'),
    alt.Y('neighbourhood_name', sort='x', title='Neighborhood')
).properties(
    title='Distribution of Tree Counts Among Vancouver’s Top 10 Neighborhoods',
    width=600,
    height=400
)

top_tree_count

Figure 3: Tree Count by Neighborhood#

We can easily replace the top_trees_df with trees_df to get a sense of tree distribution across the city, making it easier to identify both high and low tree count neighborhoods.

Hide code cell source

total_tree_count = alt.Chart(trees_df).mark_bar().encode(
    x=alt.X('count()', title='Number of Trees'),
    y=alt.Y('neighbourhood_name', sort='x', title='Neighborhood')
).properties(
    title='Tree Count by Neighborhood',
    width=600,
    height=400
)

total_tree_count

Figure 3: Tree Diameter Distribution Across Top 10 Neighborhoods#

The boxplot created below explores how tree sizes vary across neighborhoods by capturing medians, spread, and outliers. These patterns may reflect differences in neighborhood development timelines or maintenance practices.

Hide code cell source

diameter_boxplot = alt.Chart(top_trees_df).mark_boxplot().encode(
    alt.X('diameter', title='Tree Diameter (cm)'),
    alt.Y('neighbourhood_name', title='Neighborhood'),
).properties(
    title='Tree Diameter Distribution Across Top 10 Neighborhoods',
    width=600,
    height=400
)

# New annotation dataframe, which holds the location of the annotation text
annotation_df = pd.DataFrame({
    'diameter': [100],
    'neighbourhood_name': ['Renfrew-Collingwood'],
    'label': ['High median may indicate older, established trees']
})

# Creating the annotation
annotation = alt.Chart(annotation_df).mark_text(
    fontStyle='italic',
    align='left',
    baseline='middle',
    dx=10
).encode(
    x='diameter:Q',
    y='neighbourhood_name:N',
    text='label:N'
)

# Combine the boxplot and annotation
tree_diameter = diameter_boxplot + annotation

tree_diameter

Figure 4: Tree Genus Concentration by Neighborhood#

To explore whether certain neighborhoods are dominated by specific tree genera, the heatmap below shows the count of each genus across the top neighborhoods. This way, we can compare two categorical variables (neighborhood and genus) and visualize frequency patterns. Genera with fewer than 30 trees are grouped into a new category labeled “Other” to reduce clutter and improve readability.

Hide code cell source

# Count total number of trees per genus
genus_counts = top_trees_df['genus_name'].value_counts()

# Create a new column that puts rare genera (fewer than 30 trees) in it's own column to reduce clutter
top_trees_df.loc[:, 'genus_grouped'] = top_trees_df['genus_name'].apply(
    lambda x: x if genus_counts[x] >= 30 else 'OTHER'
)

heatmap = alt.Chart(top_trees_df).mark_rect().encode(
    alt.X('genus_grouped:N', title='Genus'),
    alt.Y('neighbourhood_name:N', title='Neighborhood'),
    alt.Color('count()', title='Tree Count', scale=alt.Scale(scheme='magma')),
    tooltip=['neighbourhood_name:N', 'genus_grouped:N', 'count()']
).properties(
    title='Tree Genus Concentration by Neighborhood'
)

heatmap
C:\Users\klkro\AppData\Local\Temp\ipykernel_10048\1603626241.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  top_trees_df.loc[:, 'genus_grouped'] = top_trees_df['genus_name'].apply(

Discussion#

This analysis explored how Vancouver’s street trees vary across neighborhoods in terms of abundance, size, and species diversity. Several key patterns emerged:

  • Certain neighborhoods, such as Renfrew-Collingwood and Kensington-Cedar Cottage, have noticeably higher numbers of street trees. This raises interesting questions about the factors influencing tree distribution, including neighborhood size, development history, urban planning practices, and the relative age or maturity of these areas.

  • Some neighborhoods show a wider spread or higher median tree diameters, pointing to older, more established tree populations. Conversely, the presence of extreme outliers suggests variation in tree maturity, which could inform maintenance or replacement priorities.

  • In terms of genus diversity, certain areas are dominated by a single genus like Acer, while others host a broader mix. These patterns may reflect municipal planting strategies or environmental factors like sunlight, soil, and space availability.

Overall, the findings generally aligned with expectations, but they also highlighted new questions about the historical and environmental influences on Vancouver’s urban forest. Further analysis could incorporate data on neighborhood development, proximity to green spaces, or socioeconomic factors to better understand what drives these patterns — and how they relate to broader issues like urban planning, environmental equity, and community resilience.

Concluding Remarks#

This analysis revealed meaningful variations in Vancouver’s urban tree population across neighborhoods, in terms of quantity, size, and species composition. While some neighborhoods have abundant and mature trees, others show more diversity or potential maintenance needs. These insights contribute to understanding the ecological and planning dynamics shaping the city’s urban forest.

The findings not only confirm expected patterns but also open avenues for further exploration, especially by integrating additional data sources like real estate values or neighborhood quality. Such work could help inform more equitable and resilient urban greening strategies moving forward.

Dashboard#

The dashboard below was created to showcase the relationship between tree genus concentration and tree count distribution in Vancouver’s top 10 neighborhoods. Users can interact with the dashboard by selecting a neighborhood directly from the bar chart, which filters the heatmap below to show the distribution of genera within that area. Additionally, a genus dropdown allows users to focus on specific tree types, revealing patterns in where certain genera are planted. This dual-filter approach helps highlight variations in tree diversity across neighborhoods.

With more time, I would integrate real estate data, such as average property values or rental rates by neighborhood, to explore whether there’s a correlation between urban tree diversity and housing economics. This could provide deeper insight into how ecological investment and biodiversity intersect with affordability, equity, or gentrification in Vancouver.

Hide code cell source

# Create genus dropdown widget
genus_dropdown = alt.binding_select(
    options=sorted(top_trees_df['genus_grouped'].unique()),
    name='Select Genus: '
)

genus_selection = alt.selection_point(
    fields=['genus_grouped'],
    bind=genus_dropdown
)

# Create clickable bar chart for neighborhoods
neighbourhood_selection = alt.selection_point(
    fields=['neighbourhood_name']
)

top_tree_count_interactive = top_tree_count.encode(
    color=alt.condition(
        neighbourhood_selection,
        alt.value('orange'),
        alt.value('lightgray')
    )
).add_params(
    neighbourhood_selection,
    genus_selection
).transform_filter(
    genus_selection
)

# Update heatmap to use both selections
heatmap_interactive = heatmap.add_params(
    neighbourhood_selection,
    genus_selection
).transform_filter(
    neighbourhood_selection
).transform_filter(
    genus_selection
)

# Combine charts into dashboard
dashboard = (top_tree_count_interactive & heatmap_interactive).resolve_scale(
    color='independent'
)
dashboard

References#

Not all of the work in this notebook is original. Some techniques and ideas were informed by publicly available resources and course materials. These elements were used solely for educational purposes.

Resources Used#

• Data Source – The cleaned and filtered Vancouver Street Trees dataset (5,000 rows) was provided by instructors of the “Data Visualization” course through the University of British Columbia (UBC) Key Capabilities in Data Science certificate. This data source is a subset of the original data obtained from the City of Vancouver Open Data Portal under the Open Government License – Vancouver.

• Data Wrangling Approach – The data cleaning and transformation steps, specifically filtering for a new dataframe, were guided by methods learned in the “Programming in Python” course from the Key Capabilities in Data Science certificate at UBC.

• Data Visualization – Visualizations were created by me using Altair, guided by course materials from the “Data Visualization” course within UBC Key Capabilities in Data Science certificate.

• Attribution – Portions of code structuring, explanation clarity, and troubleshooting were assisted through conversational support with ChatGPT, an AI language model by OpenAI. All final decisions, implementations, and interpretations were completed independently.